STAT 679: Problem Set #2

Q1. America Time Use Survey

Part (a)

Skim the data without visualizing it. Write three questions for follow-up analysis. Among these, at least one should compare multiple activities with one another, and at least one should compare time-points within a single activity

  1. Among all activities, which are the ones that people mostly engage in than any other?
  2. Between what times did more than 90% of the people engage in martial arts?
  3. Between noon, did people prefer hunting or dancing?
  4. What activities did a majority (>50%) of the people do between 12PM till 6PM on New Year’s Day?

Part (b)

Make a plot of prop_smooth over time for each activity. Justify your choice of visual encoding – what questions does it help answer efficiently?

In the below visualization, I have encoded the activities in the color encoding of the line plot.

activities <- read.csv("https://raw.githubusercontent.com/krisrs1128/stat992_f23/main/exercises/ps2/activity.csv") %>% 
  group_by(activity) %>% 
  mutate(
    time = as.POSIXct(time, tryFormats="%Y-%m-%dT%H:%M:%SZ"), 
    ps_percentage = prop_smooth*100
  )

step_val = activities$time[2] - activities$time[1]
ggplot(activities, aes(x=time)) + 
  geom_step(aes(y = ps_percentage)) +
  geom_step(aes(y = 50)) +
  geom_rect(aes(xmin = time, xmax = time + step_val, ymin = 50, ymax = ps_percentage, fill = ps_percentage - 50)) +
  facet_wrap(~ reorder(activity, -prop), ncol = 1, strip.position = 'left') + 
  labs(title = "Activities Americans did on New Year's day", 
       x = 'Time', y = '', fill = 'Percentage of people') + 
  scale_x_datetime(expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0,0.1,0)) +
  scale_fill_gradient2(
    low = "violetred3", high = "chartreuse4", limits = c(-50,50),
    breaks = seq(-50,50,10), labels = seq(0,100,10), 
    guide = guide_colorbar(barwidth = 20, barheight = 1)
    ) +
  theme(strip.text.y.left = element_text(angle=0, hjust=1), axis.text.y = element_blank())

Part (c)

Create an alternative visualization using a different encoding. For example, you may (but do not have to) use a heatmap, horizon or ridge line plot. Compare the trade-offs involved between the two encodings. What questions are easier to answer using your visualization from (b), and which are easier to visualize using your visualization from (c)?

I chose to make a Horizon Plot for the given data. Fortunately the visualizations from both (b) and (c) help answer all the questions I stated in part-(a). But the horizon plot has an additional advantage of utilizing the full plot area so it is easier to interpret.

cutpoints <- seq(0,100,by=10)
ggplot(activities) + 
  geom_horizon(aes(time, ps_percentage, fill = after_stat(Cutpoints)), origin = 50, horizonscale = cutpoints) + 
  facet_wrap(~ reorder(activity, -prop), ncol = 1, strip.position = 'left') + 
  scale_x_datetime(expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_hcl(palette = "PiYG") +
  labs(title = "Activities Americans did on New Year's day", 
       x = 'Time', y = '', fill = 'Percentage of people') + 
  guides(fill = guide_legend(byrow = T, reverse = T))+
  theme(strip.text.y.left = element_text(angle=0, hjust=1), axis.text.y = element_blank())

Q2. Midwestern Power Plants

Part (a)

Create a map of power plants that shows where plants are located, how they generate electricity (primary_fuel), and how much generation capacity they have (capacity_mw).

plants <- read_sf("https://raw.githubusercontent.com/krisrs1128/stat992_f23/main/exercises/ps2/power_plants.geojson")

basemap <- cc_location(loc= c(-89.63490, 42.90875), buffer = 10e5)

# tm_shape(basemap) +
#   tm_rgb(1,1,1) +
  tm_shape(plants) + 
  tm_dots(
    size='capacity_mw',
    col='primary_fuel',
    palette="Set2",
    title = "Fuel Type",
    title.size = "Capacity (in MW)"
    ) +
  tm_layout(
    legend.outside = T,
    legend.outside.position = "right"
    )

Part (b)

update_map <- function(df, selected_) {
  selected_data <- df %>% filter(selected_)
  tm_shape(basemap) +
    tm_rgb(3,2,1,1,0.25) +
    tm_shape(selected_data) + 
    tm_dots(
      size='log_capacity', 
      col='primary_fuel', 
      palette="Set2",
      legend.show = F, 
      legend.size.show = F
      )
}

update_histogram <- function(df, selected_) {
  selected_data <- df %>% filter(selected_)
  ggplot(NULL, aes(log_capacity, col= primary_fuel, fill=primary_fuel)) + 
    geom_histogram(data=df, alpha = 0.3) +
    geom_histogram(data=selected_data) +
    scale_y_continuous(expand = c(0,0,0.1,0)) +
    scale_color_brewer(palette = "Set2", guide="none") + 
    scale_fill_brewer(palette = "Set2") + 
    labs(
      x = "Capacity of the Power Plant in Mega Watts",
      y = "Number of Power Plants", 
      fill = "Type of fuel"
    ) + 
    theme(legend.position = "right")
}

ui <- fluidPage(
  h3("Mid-Western Power Plants"),
  fluidRow(
    column(6,
      h4("Brush over the plot to interact with the map"),
      plotOutput("stacked_histogram", brush = brushOpts("plot_brush", direction = "x"))
    ),
    column(6, plotOutput("map", height = 600)),
  ),
  theme = bs_theme(bootswatch = "minty")
)

server <- function(input, output) {
  selected <- reactiveVal(rep(TRUE, nrow(plants)))
  observeEvent(input$plot_brush, {
    selected(brushedPoints(plants, input$plot_brush, allRows = TRUE)$selected_)
  })
  
  output$stacked_histogram <- renderPlot(update_histogram(plants, selected()))
  output$map <- renderPlot(update_map(plants, selected()))
  # output$table <- renderDataTable(filter_df(rentals, selected()))
}

# shinyApp(ui, server)

Part (c)

Describe one strength and one limitation of the visualization generated in part (b). Consider one visual query for which it is poorly suited, and discuss (but do not implement) and alternative.

This interactive visualization will help select the locations that fall under a particular capacity, but the locations are not labelled in the map. We cannot identify the plant from the map. An alternative would be to either add the location labels, or include a data table at the bottom.

Q3. Random Point Transitions

Part (a),(b)

Encode the data in x using the x-coordinate positions of 10 circles. Animate the circles. Specifically, at fixed time intervals, generate a new set of 10 numbers, and smoothly transition the original set of circles to locations corresponding to these new numbers.

random_pt_transitions.html

<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <script src="https://d3js.org/d3.v7.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/d3-random@3"></script>
    <script src="https://cdn.jsdelivr.net/npm/d3-ease@3"></script>
    <script src="https://d3js.org/d3-selection-multi.v1.min.js"></script>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
    <svg width='1600px', height='800px'></svg>
  </body>
    <script src="random_pt_transitions.js"></script>
</html>

random_pt_transitions.js

let generator = d3.randomUniform(100,1400);
let x_coords = d3.range(10).map(generator);

circle_data = [];
for (var i = 10; i < 20; i++) {
  circle_data.push({x: x_coords[i], r: i});
}

d3.select("svg")
  .selectAll("circle")
  .data(circle_data).enter()
  .append("circle")
  .attrs({
    cx: cd => cd.x,
    cy: 400,
    r: cd => cd.r, 
    opacity: 0.5
  })

function animate() {
  circle_data = circle_data.map(cd => { return { x: generator() } });
  d3.selectAll("circle")
    .data(circle_data)
    .transition()
    .duration(2000)
    .ease(d3.easeLinear)
    .attrs({
      cx: cd => cd.x
    })
    d3.timeout(animate, 1000)
}

animate()

Output:

Part (c)

Extend your animation so that at least one other attribute is changed at each time step. For example, you may consider changing the color or the size of the circles. Make sure that transitions remain smooth (e.g., if transitioning size, gradually increase or decrease the circles’ radii).

Here, I made the radii of the circles increase gradually along with their x-coordinates

random_pt_transitions.js

function animate(t) {
  circle_data = circle_data.map(cd => { return { x: generator(), r: cd.r, rnew: (1 + Math.sin(t/10)) * cd.r } });
  d3.selectAll("circle")
    .data(circle_data)
    .transition()
    .duration(2000)
    .ease(d3.easeLinear)
    .attrs({
      cx: cd => cd.x,
      r: cd => cd.rnew
    })
    d3.timeout(() => { animate(t+1) }, 500)
}

animate(0)

Output:

Q4. Bar Chart Transitions

This problem continues [Simple Bar Chart] above. We will create a bar chart that adds and removes one bar each time a button is clicked. Specifically, the function below takes an initial array x and creates a new array that removes the first element and adds a new one to the end. Using D3’s generate update pattern, write a function that updates the visualization from [Simple bar chart] every time that update_data() is called. New bars should be entered from the left, exited from the right, and transitioned after each click.

bar-transitions.html

<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <script src="https://d3js.org/d3.v7.min.js"></script>
    <script src="https://d3js.org/d3-selection-multi.v1.min.js"></script>
    <meta charset="utf-8">
    <title></title>
    <style >
        div { padding: 30%; }
        svg { width:  1600px; height: 400px; }
        #btn-transition {
            display: inline-block; padding: 10px 20px; background-color: #cbcbcb;
            border: none; border-radius: 5px; cursor: pointer; font-size: 16px;
            transition: background-color 0.75s; margin-left: 30%;
        }
        #btn-transition:hover { background-color: #353535; color: #fff; }
    </style>
  </head>
  <body>
    <div>
        <svg></svg>
        <button id="btn-transition">Click me to move the bars!</button>
    </div>
  </body>
    <script src="bar-transitions.js"></script>
</html>

bar-transitions.js

let generator = d3.randomUniform(150,300);
bar_data = []
let id = 0

function assign_fill(x) {
    switch(x%4) {
        case 0: return `hsl(203deg 96% 45%)`
        case 1: return `hsl(257deg 96% 45%)`
        case 2: return `hsl(77deg 96% 40%)`
        case 3: return `hsl(311deg 96% 35%)`
    }
}

function move_bars() {
    bar_data = bar_data.map(d => { return {id: d.id, life: d.life + 1, height: d.height }})
    bar_data.push({life: 0, height: generator(), id: id});
    bar_data = bar_data.filter(d => d.life < 11)
    id+=1

    d3.select("svg")
      .selectAll("rect")
      .data(bar_data, bd=> bd.id)
      .join (
        enter => enter.append("rect").transition(500)
                      .attrs({
                        height: 0,
                        width: 50,
                        x: 0,
                        y: 500,
                        fill: bd => assign_fill(bd.id),
                        opacity: 0.75
                      }),
        update => update.transition(500)
                        .attrs({
                            x: bd => (bd.life-1)*55,
                            y: bd => 500 - bd.height,
                            height: bd => bd.height,
                            width: 50
                        }),
        exit => exit.transition(500)
                    .attrs({y: 500})
                    .remove()
      )
}

move_bars()

Output: